Minimum recognition unit of a PEM mucin tandem repeat specific monoclonal antibody and detection method using same

ABSTRACT

The peptide EPPT (Glu-Pro-Pro-Thr)(SEQ I YD NO:1) selectively binds a mucin expressed by epithelial tumors. It may be incorporated into larger molecules, such as peptides consisting of the sequence EPPT and further amino acids to form a peptide of up to 30 amino acids, and may be radiolabelled or used to guide toxins, etc. to cells expressing the mucin.

The present invention relates to immunoreactive compounds; more specifically, it relates to genetically engineered antibodies.

Antibodies (Abs) are key molecules of the immune system. They provide defence against infection by microbial agents and are involved in a host of other immune reactions such as autoimmunity, allergies, inflammation, and graft rejection. Abs are unique in their specificity and are able to distinguish between very similar antigenic determinants of antigens. Because of this property, among others, antibodies are invaluable reagents for detecting, localizing and quantifying antigens.

Abs were initially obtained from immunized animals, but since different anti-sera represent different pools of heterogeneous Abs of varying specificities and isotypes, it was difficult to carry out reproducible studies using sera as the source of antibodies. It was then realized that large quantities of homogeneous Abs are produced in multiple myeloma, a tumor of plasma cells. Much of the information about the structure of immunoglobulins (Igs) was derived from studies using myeloma proteins. With the development of hybridoma technology, it became possible to generate an essentially endless supply of monoclonal antibodies (MAbs) of the desired specificities. Antibodies are formed by polypeptide chains held together by non-covalent forces and disulfide bridges. A pair of identical light (L) chains (214 amino acids long) is linked to two identical heavy (H) chains to form a bilaterally symmetric structure (FIG. 1).

The polypeptide chains are folded into globular domains separated by short stretches of peptide segments; the H chain has four or five domains, depending on the isotype, and the L chain has two. The N-terminal portion of each chain constitutes the variable region (V_(H), V_(L)). A V_(H) - V_(L) pair carries the antigen combining site and contributes to antibody specificity. The rest of the chain forms the C region, the region of the molecule responsible for effector functions, such as F_(c) receptor binding, complement fixation, catabolism and placental transport. Igs with different C regions and therefore of differing isotypes (in human they are IgM, IgD, IgG1-4, IgA2, and IgE) exhibit different biological properties. In most isotypes a hinge region separates C_(H) 1 and C_(H) 2 and provides the molecule with segmental flexibility. The enzyme papain cleaves near the hinge to generate the F_(ab) and F_(c) portions of the antibody molecule.

Murine MAbs are invaluable in research, but human Igs would be preferable for many applications, such as diagnosis and immune therapy, as they may interact more effectively with the patient's immune system. Because of the species difference, mouse Abs administered to humans can induce an immune response resulting in allergy, serum sickness or immune complex disease. These effects preclude repeated administration of MAb. It has also been demonstrated in clinical trials that host Abs neutralize the injected mouse MAbs and account for their rapid clearance. While mouse and rat hybridomas are easy to produce, attempts to make human MAbs have met with limited success. Mouse-human hybridomas are frequently genetically unstable, and the production of human/human hybrids has been hampered by the lack of suitable immortalized human cell lines and immunized human B cells. Due to ethical considerations, in vivo immunization of humans is very restricted.

Gene transfection provides an alternative method of producing MAbs. With this method it is now possible to produce not only wild-type Ig chains, but also novel Igs and mutants that have been constructed in vitro. Antibodies of the desired specificities, binding affinities, isotypes and species origin can be obtained by transfecting the appropriate genes into mouse myeloma or hybridoma cells in culture. Gene transfections circumvent many of the problems inherent in the hybridoma methodology. Since human or chimeric antibodies with the human constant (C) regions can be produced, the problem of immunogenicity can be avoided or minimized. Another advantage over mouse-human or human-human hybridomas is that the transfected mouse cells can be injected intra-peritoneally into mice, where they will proliferate and produce ascites from which large quantities of antibodies can be isolated.

The first chimeric mouse-human antibody made use of the rearranged and expressed V region genes from the myeloma S107 specific for phosphocholine. The V_(H) was joined to human C.sub.γ 1 or C.sub.γ 2, and V_(L) was joined to human Cκ[1]. When expressed together in the same cell, the heavy and light chains assembled into H₂ L₂ tetramers that were secreted. This antibody bound antigen, and reaction with three monoclonal anti-idiotope antibodies verified that the polypeptide chains had folded appropriately to reproduce the antigen binding domains.

Experiments from the laboratory of Hozumi and- his coworkers demonstrated that transfection of the rearranged murine TNP-specific μ and κ genes into plasmacytoma and hybridoma lines resulted in the production of pentameric IgM that bound hapten and triggered complement-dependent hemolysis [2, 3]. The TNP V_(H) and V_(L) genes were linked to human C.sub.μ and C₇₈ segments, respectively, to produce chimera IgM that again exhibited the properties of the wild-type mouse antibody [3, 4].

Two mouse-human chimaeric IgEs were produced that could trigger degranulation of mast cells when cross-linked by antigen on the cell membrane [5, 6].

To explore the potential of chimaeric antibodies in cancer therapy, an antibody was made that consists of the V regions derived from the mouse MAb 17-1A, which recognizes a tumor-associated surface Ag, and human C.sub.γ 3 [7]. This chimaeric antibody had the same binding properties as the original mouse antibody.

Another chimaeric antibody with specificity for the surface antigen associated with certain human carcinomas was found to bind to human carcinoma cells [8].

Thus, murine variable regions were combined with human constant regions. A further step in the humanization of rodent antibodies was the synthesis of hybrid variable regions in which the framework residues are of human origin and the complementarity-determining regions (CDRs) come from a mouse antibody. In the case of an NP-specific antibody, it was shown that the transfer of the CDRs from an anti-NP hybridoma onto the human framework of a human myeloma protein resulted in the transfer of antigen-specificity [9]. This result, obtained with a hapten antigen, was extended to antibodies directed against hen-egg lysozyme as well as a human T cell surface antigen [10, 11]. Thus, CDR grafting not only facilitates the construction of chimaeric antibodies but also allows the production of therapeutic reagents in which only the antibody CDR residues are of non-human origin. CDR3 appears to be particularly important in determining the specificity of the antibody. Thus, Taub et al [52] showed that the specificity of an antibody to the platelet fibrinogen receptor was determined largely by the RYD sequence within the CDR3 of the antibody heavy chain. Williams et al [53] used a synthetic peptide from the light chain CDR2 of an antibody to inhibit the interaction of the antibody with its receptor. Conceptually, there is a "minimum recognition unit" of any given antibody, as discussed by Winter and Milstein [54].

We have now identified the minimum recognition unit of a murine monoclonal antibody specific for a mucin molecule associated with epithelial tumors, namely the amino acid sequence EPPT (Glu-Pro-Pro-Thr) (SEQ ID NO 1).

One aspect of the present invention therefore provides a molecule comprising the amino acid sequence EPPT (SEQ ID NO 1.

All peptides herein are written H₂ N. . . COOH and the amino acids are the naturally-occurring L isomers.

The molecule of the invention may consist of the sequence EPPT (SEQ ID NO:1). This short peptide may or may not be useful in its own right as a binding entity as discussed below but, if not useful in that way, may be used to prepare longer molecules of the invention which do bind the target entity.

Preferably, the EPPT peptide includes further amine acids extending it in the N-terminal and/or C-terminal direction(s). For example, the peptide may be EPPTRTFAY (SEQ ID NO. 2), REPPTRTFAYWG (SEQ ID NO. 3) or MYYCAREPPTRTFAYWGQG (SEQ ID NO. 4) or any EPPT(SEQ ID NO:1)-containing fragment thereof. The peptide may be inserted in place of or form part of the CDR region (preferably CDR3) of an antibody variable framework region.

Preferably, the said variable framework region is human.

Thus, the peptide may form part of a (preferably human) antibody. Suitably, the peptide forms part of a complete (preferably human) V_(H) or V_(L) region and may additionally be associated with the remaining parts of the antibody to form a complete (preferably humanized) antibody. Alternatively, the peptide may form part of a smaller fragment of an antibody, such as an F_(ab), (F_(ab))₂, F.sub.ν, _(sc) F.sub.ν or dab molecule. Further, the peptide may be expressed as part of a phage.

The isolated EPPT (SEQ ID NO:1)peptide itself, isolated EPPT(SEQ ID NO:1)-containing peptides of up to 30 amine acids in length, EPPT(SEQ ID NO:1)-containing polypeptides, proteins, antibody fragments and phages and EPPT(SEQ ID NO:1)-containing molecules generally are termed "molecules of the invention".

Such molecules are at their most useful when the EPPT (SEQ ID NO:1) moiety is exposed on the surface of the molecule and is available for interaction with other molecules. However, other such molecules which can be re-arranged in order to expose the EPPT (SEQ ID NO:1) moiety are also encompassed. For example, a molecule of the invention may be expressed as an insoluble protein in bacteria and then refolded in vitro for use in the methods of the invention.

The molecules of the invention may be used for a variety of purposes relating to the study or isolation and purification of the mucin to which they specifically bind and the imaging and treatment of cells exhibiting the mucin. For example, because the mucin is shed by the cell concerned, the molecule may be used in a diagnostic assay, based on blood or serum, for the presence of the cell in the body (eg using a radiolabelled version of the molecule to bind to antigen in the serum to compete for immobilized antigen with antigen-specific antibodies in the serum). In other embodiments, the molecule of the invention is coupled to a scintigraphic radiolabel, a cytotoxic compound or radioisotope, an enzyme for converting a non-toxic prodrug into a cytotoxic drug, a compound for activating the immune system in order to target the resulting conjugate to a desired cell type in the body, for example a tumor cell, or a cell-stimulating compound. Such conjugates have a "binding portion", which consists of the EPPT(SEQ ID NO:1)-containing molecule of the invention, and a "functional portion", which consists of the radiolabel, toxin or enzyme etc.

The molecule of the invention may alternatively be used alone (or, especially to increase in vivo stability, with an inert polypeptide addition, which may or may not form a complete antibody or antibody fragment with the said peptide) in order simply to block the activity of the mucin, particularly by physically interfering with its binding of another compound.

The binding portion and the functional portion of the conjugate (if also a peptide or polypeptide) may be linked together by any of the conventional ways of cross-linking polypeptides, such as those generally described in [18]. For example, one portion may be enriched with thiol groups and the other portion reacted with a bifunctional agent capable of reacting with those thiol groups, for example the N-hydroxysuccinimide ester of iodoacetic acid (NHIA) or N-succinimidyl-3-(2-pyridyldithio)propionate (SPDP). Amide and thioether bonds, for example achieved with m-maleimidobenzoyl-N-hydroxysuccinimide ester, are generally more stable in vivo than disulphide bonds.

The functional portion of the conjugate may be an enzyme for converting a non-toxic prodrug into a toxic drug, for example the conjugates of Bagshawe and his colleagues [19-21] or cyanide-releasing systems [22].

It may not be necessary for the whole enzyme to be present in the conjugate but, of course, the catalytic portion must be present. So-called "abzymes" may be used, where a monoclonal antibody is raised to a compound involved in the reaction one wishes to catalyse, usually the reactive intermediate state. The resulting antibody can then function as an enzyme for the reaction.

The conjugate may be purified by size exclusion or affinity chromatography, and tested for dual biological activities. The peptide immunoreactivity may be measured using an enzyme-linked immunosorbent assay (ELISA) with immobilized antigen and in a live cell radio-immunoassay. An enzyme assay may be used for β-glucosidase using a substrate which changes in absorbance when the glucose residues are hydrolyzed, such as oNPG (o-nitrophenyl-β-D-glucopyranoside), liberating 2-nitrophenol which is measured spectrophotometrically at 405 nm.

Stability of the conjugate may be tested in vitro initially by incubating at 37° C. in serum, followed by size exclusion FPLC analysis. Stability in vive can be tested in the same way in mice by analyzing the serum at various times after injection of the conjugate. In addition, it is possible to radiolabel the peptide with ¹²⁵ I, and the enzyme with ¹³¹ I before conjugation, and to determine the biodistribution of the conjugate, free antibody and free enzyme in mice.

Alternatively, the conjugate may be produced as a fusion compound by recombinant DNA techniques whereby a length of DNA comprises respective regions encoding the two portions of the conjugate either adjacent one another or separated by a region encoding a linker peptide which does not destroy the desired properties of the conjugate.

Conceivably, the two functional portions of the compound may overlap wholly or partly. The DNA is then expressed in a suitable host in known ways.

The conjugates may be administered in any suitable way, usually parenterally, for example intravenously, intraperitoneally or, preferably (for bladder cancers), intravesically (i.e. into the bladder), in standard sterile, non-pyrogenic formulations of diluents and carriers, for example isotonic saline (when administered intravenously). Once the conjugate has bound to the target cells and been cleared from the bloodstream (if necessary), which typically takes a day or so, the pro-drug is administered, usually as a single infused dose, or the tumor is imaged. If needed, because the conjugate may be immunogenic, cyclosporin or some other immunosuppressant can be administered to provide a longer period for treatment but usually this will not be necessary.

The timing between administrations of conjugate and pro-drug may be optimized in a non-inventive way since tumor/normal tissue ratios of conjugate (at least following intravenous delivery) are highest after about 4-6 days, whereas at this time the absolute amount of conjugate bound to the tumor, in terms of percent of injected dose per gram, is lower than at earlier times.

Therefore, the optimum interval between administration of the conjugate and the pro-drug will be a compromise between peak tumor concentration of enzyme and the best distribution ratio between tumor and normal tissues. The dosage of the conjugate will be chosen by the physician according to the usual criteria. At least in the case of methods employing a targeted enzyme and intravenous amygdalin as the toxic pro-drug, 1 to 50 daily doses of 0.1 to 10.0 grams per square meter of body surface area, preferably 1.0-5.0 g/m² are likely to be appropriate. For oral therapy, three doses per day of 0.05 to 10.0 g, preferably 1.0-5.0 g, for one to fifty days may be appropriate. The dosage of any conjugate will similarly be chosen according to normal criteria, particularly with reference to the type, stage and location of the tumor and the weight of the patient. The duration of treatment will depend in part upon the rapidity and extent of any immune reaction to the conjugate.

The conjugates, if necessary together with an appropriate pro-drug, are in principle suitable for the destruction of cells in any tumor or other defined class of cells selectively exhibiting the mucin recognized by the EPPT entity. This mucin appears to be expressed by a wide variety of epithelial tumors, including lung and breast tumors and tumors of the urinary tract, especially the bladder. The compounds are principally intended for human use but could be used for treating other mammals, including dogs, cats, cattle, horses, pigs and sheep.

The methods of the invention may be particularly suitable for the treatment of bladder carcinoma in situ, administering the antibody-enzyme conjugate and the amygdalin intravesically. Our studies on the administration of radiolabelled antibodies via this route indicate that high tumor/normal bladder ratios can be achieved, and that the antibody does not enter the circulation. Bladder cancer accounts for 2% of all human malignancies, of which approximately 70% of cases are superficial at the time of diagnosis. Recurrences occur in as many as 80% of cases after surgical resection, 10% of these progressing to a higher grade carcinoma with poorer prognosis.

The functional portion of the conjugate, when the conjugate is used for diagnosis, usually comprises and may consist of a radioactive atom for scintigraphic studies, for example technetium 99 m (^(99m) Tc) or iodine-123 (¹²³ I), or a spin label for nuclear magnetic resonance (nmr) imaging (also known as magnetic resonance imaging, mri), such as iodine-123 again, iodine-131, indium-111, fluorine-19, carbon-13, nitrogen-15, oxygen-17, gadolinium, manganese or iron.

When used in a compound for selective destruction of the tumor, the functional portion may comprise a highly radioactive atom, such as iodine-131, rhenium-186, rhenium-188 or yttrium-90, which emits enough energy to destroy neighboring cells, or a cytotoxic chemical compound such as methotrexate, adriamicin, vinca alkaloids (vincristine, vinblastine, etoposide), daunorubicin or other intercalating agents.

The radio- or other labels may be incorporated in the conjugate in known ways. For example, the EPPT(SEQ ID NO:1)-containing peptide may be biosynthesized or may be synthesized by chemical amino acid synthesis using suitable amino acid precursors involving, for example, fluorine-19 in place of hydrogen. In such a compound, the CDR3 peptide incorporates the radio-label. Labels such as ^(99m) Tc, ¹²³ I, ¹⁸⁶ Rh, ¹⁸⁸ Rh and ¹¹¹ In can be attached via a cysteine residue in the peptide. Yttrium-90 can be attached via a lysine residue. The IODOGEN method [23] can be used to incorporate iodine-123. Reference [24] describes other methods in detail.

Nucleotide coding sequences encoding the molecules of the invention (when the molecule is a peptide, polypeptide or protein) form further aspects of the invention, as do vectors and expression vehicles comprising such coding sequences, hosts including such vectors or expression vehicles (eg bacteria such as E. coli, yeasts such as cerevisiae, mammalian cells such as lymphoid cell lines (eg myeloma cells) and transgenic animals and plants, especially those in which the transgene is targeted for expression in cells other than B lymphocytes, and processes for preparing such molecules by culturing such hosts and isolating the molecules.

Thus, a further aspect of the invention provides a polynucleotide encoding an EPPT(SEQ ID NO:1)-containing peptide, polypeptide, protein or bacteriophage.

Such polynucleotides may be devised and created by known techniques such as those in the Sambrook et al manual [55]. In the specific case of "phage antibodies" i.e. bacteriophages including a part of an antibody (namely, in this case, the EPPT (SEQ ID NO:1) moiety), the techniques of McCafferty et al [56] may be used. Thus, the EPPT region may be expressed at the N-terminal region of the gene III protein of phage fd, since the gene III protein is normally expressed at the tip of the phage.

More specifically, an expressible polynucleotide coding sequence may be prepared by (i) providing a cell which comprises a first coding sequence encoding an EPPT(SEQ ID NO:1)-containing peptide, polypeptide or protein, (ii) obtaining DNA corresponding to the said coding sequence, and (iii) inserting the said DNA into a suitable expression cassette for the intended host.

Ways of carrying out step (i) are familiar from conventional immunological techniques and monoclonal antibody techniques. Essentially, an animal may be immunized with the antigen of interest, and immune system cells, for example spleen cells are isolated, optionally followed by immobilization thereof by fusion with myeloma cells. Alternatively, based on more recent techniques, single B lymphocytes may be used or EBV-immortalized cells.

In step (ii) the specific EPPT(SEQ ID NO:1)-encoding DNA (or, more accurately, a copy thereof) may be isolated from the cell by PCR-based techniques, using primers known to be specific for variable framework regions. Further identical versions of such DNA may of course be made by PCR, chemical synthesis, reproduction in vivo or whatever method is convenient.

Step (iii) may be carried out by conventional recombinant DNA ligation techniques, using a promoter and other regulatory sequences suited to the intended host.

Embodiments of the invention and ways of putting the invention into practice will now be described in more detail, with reference to various examples and with reference to the accompanying figures, in which:

FIG. 1 shows the human γ constant region genes cloned as SalI - BamHI cassettes. C_(H) 1, C_(H) 2 and CH.sub. 3 exons of γ3 are indicated by empty boxes, and those of γ₄ by solid boxes. The hinge region of wild-type γ₃ is encoded by four exons: hinge exon 1 (hatched) is unique; exons 2, 3, 4, (shaded) are identical). wild type γ₄ has a single hinge exon (cross-hatched). Thin arrows point to the StuI site in the intron that was converted to PvuI by the insertion of linkers. Hinge-modified constructs 1 and 2 show γ₃ and γ₄ with reciprocal hinge switches. 3-6 are modifications within γ₃ ; 3, γ₃ with hinge exon 1; 4, exon 4; 5, exons 1, 2, 3, 4, 2, 3, 4; and 6 no hinge.

FIG. 2 shows the binding of EPPT (SEQ ID NO:1) peptides to the immobilized mucin receptor peptide; and

FIG. 3 shows the same data as FIG. 2, presented as a bar chart.

I. GENERAL TECHNIQUES

Expression of antibody coding regions in transfected myeloma cells

From 1983, several groups have described the introduction of DNA encoding an immunoglobulin heavy or light chain into a lymphoid cell line [25-28]. In all these cases, the genomic DNA encoding the immunoglobulin polypeptide was cloned into a gpt- or nee-based plasmid vector. The resultant plasmids were then introduced into the lymphoid cell lines either by use of co-precipitation with calcium phosphate or by DEAE-dextran facilitated DNA uptake or by fusing the lymphoid cells with spheroplasts made from the Escherichia coli that harbored the plasmid. Later, electroporation found wide dse as an effective means of introducing DNA into a wide variety of cell-lines. From these and subsequent experiments, it is clear that the transfected gene is expressed in a manner appropriate to the cell type. In other words, in myeloma cells the transfected gene is heavily transcribed and good quantities of antibody are secreted; in pre-B and B cell lines, the introduced gene is expressed at much lower level and in non-lymphoid lines there is no correct expression of the introduced immunoglobulin gene. It is therefore clear that myelomas provide the ideal hosts for expressing transfected antibody genes as they not only recognize the immunoglobulin gene transcription signals and therefore produce the antibody in abundance but they are also well equipped for protein secretion.

Typically, the genes for the heavy and/or light chain of the desired antibody are cloned into a nee- or gpt-based plasmid. Ig genes to be used for transfection are cloned into eukaryotic expression vectors, the most commonly used of which are the pSV2 plasmids developed by Berg and coworkers [42-44]. These vectors contain a plasmid origin of replication and a marker for selection in bacteria, so large quantities of DNA can easily be obtained for genetic manipulations. Another essential feature of these vectors is a dominant marker selectable in eukaryotic cells; this marker is a bacterial gene transcribed under the control of the SV40 early region promoter. Included in this eukaryotic transcription unit 3' of the bacterial gene are SV40 sequences for splicing and polyadenylation. It is important that these are dominant selectable markers (markers that produce a selectable change in the phenotype of normal cells) so that they can be used in cell lines that have not been drug marked.

One of the selectable markers is the Escherichia coli gene encoding xanthine-guanine phosphor-ibosyltransferase (gpt). This enzyme, unlike the analogous endogenous enzyme, can use xanthine as a precursor for xanthine monophosphate, and permits cells provided with xanthine to survive in the presence of mycophenolic acid, a drug that blocks purine biosynthesis by preventing the conversion of inosine monophosphate to xanthine monophosphate. A second selectable marker is the neogene from the transposon Tn5, which encodes a phosphotransferase that can inactivate the antibiotic G418. G418 interferes with the function of the 80S ribosome and blocks protein synthesis in eukaryotic cells.

The two plasmids pSV2gpt and pSV2neo (FIG. 1) contain the pBR322 origin of replication and the beta-lactamase gene for Amp^(R). A more recently developed vector, pSV184neo, contains the Cm^(R) gene and the origin of replication from the plasmid pACYC184 [45]. pBR and pACYC-derived plasmids are compatible, and so both can be propagated non-competitively within a bacterium. These vectors are not known to replicate as episomes in mouse cells but rather integrate into the chromosome. They are useful when stable transfectants are desired as a continuous source of antibody. In addition to the genes described above, pSV5 vectors contain the polyoma virus early region, which enables them to replicate to thousands of copies per mouse cell. The pSV5 vectors have only been employed for transient expression of immunoglobulin genes [46]. The exact extent of immunoglobulin gene DNA sequences that are required for maximal expression has not yet been fully determined. Initial experiments used complete genomic DNA with several kilobases of flanking sequence. However, most of the DNA that constitutes an antibody gene is, in fact, intron and much of this is probably dispensable. The major intron of the mouse heavy chain locus contains a transcription enhancer [29, 30, 26]; this enhancer can be removed from the intron and placed upstream of the immunoglobulin gene. Thus, much of the major intron of the immunoglobulin heavy chain gene can be deleted without any consequent loss of antibody yield after transfection [31]. There is at present no evidence that other introns within an immunoglobulin gene contain signals essential for antibody expression. However, experiments in which transcription of a μ heavy-chain cDNA is driven by a V_(H) promoter/IgH enhancer combination have revealed that good expression requires the presence of an intron although this requirement is not specific for a particular intron [32]. Finally, in the context of immunoglobulin transcription signals, it should be noted that multicopy transfected Ig genes normally yield considerably less secreted antibody than is obtained from the single-copy endogenous gene in hybridomas. That this is also found in analogous experiments using transgenic mice strongly suggests that high level immunoglobulin gene expression requires sequence elements located beyond the region of DNA normally used in transfection experiments and which are therefore at some distance from the constant-region exons.

Expression in transfectants of non-lymphoid cells

Expression of transfected Ig genes in lymphoid cell-lines can also be driven by non-immunoglobulin transcription signals. For example, promoters from a heat-shock gene or from SV40 or human cytomegalovirus have been used successfully [33-35]. The use of these transcription elements that are not lymphoid-specific appears to offer several advantages. The yields of antibody can be as good as those presently obtained with the V_(H) promotor/IgH enhancer. [This, however, may reflect that the segments of genomic Ig genes used lack important transcription signals]. Furthermore, some viral transcription elements can allow good expression from Ig cDNA constructs without manifesting an intron requirement; this may prove of great use in the synthesis of modified antibodies or antibody fragments.

The use of viral and heat-shock promoters has allowed the synthesis of antibody by transfectants of non-lymphoid cell-lines to be evaluated. Success has been achieved with both IgM and IgG antibodies in non-lymphoid mammalian cell lines [33-35]. Indeed, if the pattern of glycosylation is not severely affected, non-lymphoid hosts may be used for the expression of engineered antibodies.

Expression in transgenic animal

The introduction of immunoglobulin gene DNA into the mouse germline [36] illustrates the use of transgenic animals for the production of monoclonal antibodies. The gene for a chimaeric human IgA2 antibody may be introduced into the mouse germline [39]. The transgenic mice contain good levels of the chimaeric antibody in serum (about 100 μg/ml) and the antibody was also secreted in colestrum and milk.

Expression in Escherichia coli and yeast

References 38 and 39 disclose bacterial expression systems for the synthesis of antibody fragments (F.sub.ν, F_(ab) and the F_(c) of IgE). Thus, there are now many expression systems available for the production of engineered antibodies although the technology will obviously continue to develop.

Making chimaeric antibodies with human effector functions

The in vitro manipulation of immunoglobulin gene DNA prior to its introduction into myeloma cells allows the production of chimaeric antibodies which contain mouse or rat antigen-binding variable (V) regions linked to human constant (C) regions. In order to construct such antibodies, a mouse or rat hybridoma specific for the desired antigen is made using the standard procedures; the expressed V region genes of the hybridoma are then isolated, joined to human C region genes by in vitro DNA recombination and a plasmid containing the genes for this chimaeric antibody is then introduced into a myeloma cell line. In this way, chimaeric human IgM, IgG or IgE antibodies have been made that are specific for TNP, phosphocholine or NP [40, 41, 31].

Methods of cloning immunoglobulin genes for expression

Rearranged Ig variable region genes can be isolated from genomic libraries of hybridomas by using the appropriate DNA probes.

In most cases the cloning strategy takes advantage of the fact that both the heavy and light chain variable regions must be joined by a J region before they can be expressed. J region probes can therefore be used to distinguish the expressed variable regions from the hundreds of nonexpressed variable regions. This approach is frequently complicated by the presence of aberrantly rearranged variable regions, and a secondary assay must be used to distinguish aberrant from productive rearrangements [8]. Variable regions cloned from genomic DNA are usually expressed using their own promoter regions.

It is also possible to express rearranged V region genes that have been cloned from cDNA libraries [47, 48]. Two approaches have been used to express cDNA. In one approach the cDNAs were used to construct a variable region identical to a genomic clone with a human Ig promoter used for expression [47]. In a second approach in vitro mutagenesis was used to make the Ig region suitable for expression from an SV40 promoter [48]. Both approaches provide alternatives to genomic cloning for expression the desired variable regions.

The production of a functional antibody requires the synthesis and proper assembly of both H and L chains. For gene transfection both genes can be cloned into one vector [3]; however, this is technically difficult because of the limitation of unique restriction sites within a large plasmid. Therefore, smaller plasmids are preferable for genetic manipulations and DNA preparation.

A more practical approach has been to clone the H and L chains into two separate plasmids (FIG. 1). For example, the H chain gene is introduced into pSV2-gpt and the L chain into pSV184-neo (described above). Both plasmids are then transfected into Escherichia coli and amp^(R) Cm^(R) clones are isolated. Using protoplast fusion, both vectors are simultaneously transfected into the recipient cell in a single step. With the two chains on different plasmids, alterations within the gene of one chain can be made independent of the other, the transfection of different H and L chain combinations is facilitated.

It is useful to design gene `cassettes` that make it convenient to shuffle exons of complete genes to or join a V gene to different C regions, and vice versa. For this, linkers can be used to introduce unique restricted sites in the genes. In the original vectors for the expression of chimaeric Igs, the constant regions were constructed as a SalI-BamHI cassette. In later constructions, unique PvuI sites have been placed within the intervening sequences, separating each human γ constant region domain (lower portion of FIG. 1); placing linkers within the intervening sequences avoids disrupting the translational reading frame. The presence of unique restriction sites between exons makes it much more straightforward to shuffle exons.

Ig chains have been produced in which V_(H) is attached to C_(L) and V_(L) is attached to C_(H) [49-51]. These molecules assembled, were secreted, and, when containing the appropriate variable regions, bound antigen. These light-chain heterodimers potentially provide antigen binding capacity devoid of effector function. These and similar molecules that do not occur in vivo may be modified in accordance with the invention.

II. SPECIFIC EXAMPLES

EXAMPLE 1: ISOLATION OF THE VARIABLE DOMAINS OF CLONE B

Clone B is a lymphoblastoid cell line (secreting antibody directed against a tumor-associated mucin molecule) derived from the EBV-transforming and cloning of a patient's peripheral blood B-cells. After DNA isolation, the polymerase chain reaction (PCR) was employed, using oligonucleotide primers specific for the variable light and heavy chains of immunoglobulins (Table 1).

                                      TABLE 1                                      __________________________________________________________________________     Oligonucleotide Primers                                                        For variable domain: Heavy chain (V.sub.H)                                     Primer name         Primer                                                     __________________________________________________________________________     V.sub.H EcoRI For CTCGAATTCTGAGGAGACGGTGACCGTGGTCCCTTGGCCCC (Seq ID NO         5)                                                                             V.sub.H Bam Back ATCGGATCCAGGTSMARCTGCAGSAGTCWGG(Seq ID NO                     __________________________________________________________________________     6)                                                                              (where S = C or G, M = A or C, R = A or G and W = A or T)                

The forward primer contains an EcoRI site and a BstEII site. The back primer contains a BamHI site and a PstI site.

The isolated DNA was assayed by agarose gel electrophoresis and found to be 350 base pairs in size. The DNA encoding for the V_(H) region was gene-cleaned and ligated into a plasmid (pUC18). A single colony of TG-1 bacteria containing plasmid was isolated. This colony was expanded and a mini-prep of DNA obtained.

From this mini-prep the V_(H) gene of the human antibody, which we now designate as clone-B, was isolated using restriction enzyme digestion. The gene encoding for the V_(H) region was then ligated into a sequencing phage (M13mp18). TG-1 bacterial cells were transformed with phage and grown and single stranded DNA was isolated. The single stranded DNA was sequenced, using the sequenase reaction, and ran into a 6% acrylamide gel (0.4 mm thick).

Sequencing of the gene encoding the V_(H) region of clone B provided a sequence consistent with the Kabat Human Heavy chain subgroup II classification (Table 2).

                                      TABLE 2                                      __________________________________________________________________________     HUMAN ANTI-MUCIN MoAb: DNA (SEQ ID NO 7) &                                     AMINOACID SEQUENCE (SEQ ID NO 8)                                               __________________________________________________________________________      ##STR1##                                                                       ##STR2##                                                                       ##STR3##                                                                       ##STR4##                                                                       ##STR5##                                                                       ##STR6##                                                                       ##STR7##                                                                       ##STR8##                                                                       ##STR9##                                                                      __________________________________________________________________________

EXAMPLE 2: ISOLATION OF THE VARIABLE DOMAINS OF NM-2

The antibody NM-2 is a murine monoclonal antibody class IgG.1, Lambda light chain, which has specificity for the mucin molecule. The antibody reacts with about 95% of epithelial tumors and cross-reacts with normal mucin.

Cloning of the variable domains of NM-2. Both the variable heavy and light chains have been cloned and sequenced using the techniques outlined in Example 1. A full list of primers and their sequences is given in Table 3.

                                      TABLE 3                                      __________________________________________________________________________     Primers used for the isolation and sequencing of NM-2 V.sub.H and              V.sub.L genes.                                                                 Primer Name        Primer Sequence                                             __________________________________________________________________________     i) For variable Heavy Chain gene (V.sub.H)                                     V.sub.H EcoR1 For CTCGAATTCTGAGGAGACGGTGACCGTGGTCCCTTGGCCCC (Seq ID NO         5)                                                                             V.sub.H Bam Back ATCGGATCCAGGTSMARCTGCAGSAGTCWGG (Seq ID NO 6)                 ii) For variable light chain (V.sub.H)                                         V.sub.L Back Eco CAGGCTGTTGTGACTCAGGAATTCGCACTCACC (Seq ID NO 9)               V.sub.L For Xba ACCTAGTCTAGACAGTTTGGTTCCTCCACC (Seq ID NO                      __________________________________________________________________________     10)                                                                             NOTE: All primer sequences are 5'-3'-                                    

Having sequenced both the heavy (V_(H))and light (V_(L)) chain genes for NM-2, we are now aware of all its CDR sequences (Tables 4 & 5).

                                      TABLE 4                                      __________________________________________________________________________     DNA (SEQ ID NO 11) AND AMINO ACID SEQUENCE (SEQ ID NO 12) OF                   MONOCLONAL ANTIBODY NM-2 V.sub.H DOMAIN                                        __________________________________________________________________________      ##STR10##                                                                      ##STR11##                                                                      ##STR12##                                                                      ##STR13##                                                                      ##STR14##                                                                      ##STR15##                                                                     __________________________________________________________________________

                                      TABLE 5                                      __________________________________________________________________________     NM2LAMBDA (DNA sequence is SEQ ID No 13 and                                    amino acid sequence is SEQ ID NO 14)                                           __________________________________________________________________________      ##STR16##                                                                      ##STR17##                                                                      ##STR18##                                                                      ##STR19##                                                                      ##STR20##                                                                      ##STR21##                                                                      ##STR22##                                                                      ##STR23##                                                                     __________________________________________________________________________

Six peptides which correspond to the CDRs of NM-2's variable heavy chain has been synthesized (Table 6).

                                      TABLE 6                                      __________________________________________________________________________     CDR1A                                                                          1. Sequence NH2-SLTSYGVHWVR-COOH (SEQ ID NO 15)                                CDR3A                                                                          2. Sequence NH2-YCAREPPTRTFAYWGQG-COOH (SEQ ID NO 16)                          CDR3B                                                                          3. Sequence NH2-MYYCAREPPTRTFAYWGQG-COOH (SEQ ID NO 4)                         CDR3D                                                                          4. Sequence NH2-EPPTRTFAY-COOH (SEQ ID NO 2)                                   CDR3D                                                                          5. Sequence NH2-REPPTRTFAYWG-COOH (SEQ ID NO 3)                                CDR2A                                                                          6. Sequence NH2-WLVVIWSDGSTTYNSALNSRCM-COOH (SEQ ID NO 17)                     __________________________________________________________________________

The CDR3 peptides with the amino acid core sequence EPPT(SEQ ID NO:1), showed antigen-binding specificity. A particularly interesting finding was that the amino acid sequence EPPT was present in the CDR3 of both murine (NM2) and human antibodies (clone B).

CDR3 grafting. The EPPT (SEQ ID NO:1) sequence was used to substitute the 5' end of CDR3 of both human and murine antibodies. These novel antibodies were shown to possess anti-mucin specificity. Thus we have shown that grafting of CDR3 alone is sufficient to endow anti-tumor specificity. This observation obviates the need for full CDR, i.e. CDR1 and CDR2 in addition to CDR3 grafting (as previously described by G. Winter and colleagues) in at least some cases of antibodies.

EXAMPLE 3: BINDING OF MOLECULES OF THE INVENTION TO RADIOLABELLED MUCIN PEPTIDE

Preparation of immobilized mucin receptor peptide. 7.5 mg of YVTSAPDTRPAPGST (SEQ ID NO 18) (the epitope sequence from the mucin molecule) was dissolved in 100 mM sodium phosphate buffer pHS. This was mixed with 7.5 mg of bovine serum albumin (BSA) and made up to 0.5 ml with buffer. 5 μl of 25% glutaraldehye was added, mixed and left to stand at room temperature for 15 minutes. A further 2.5 μof glutaraldehyde was then added and the mixture left for a further 15 minutes.

Following the incubation, 100 μof 1M glycine pH6 was added and left for 10 minutes to quench the glutaraldehyde. The reaction mixture was then aliquoted into 100 μl portions and stored at -20° C. until use.

Radiolabelling of EPPT (SEQ ID NO:1) peptides prior to binding assays.

Peptides were dissolved in phosphate buffered saline, 100 mM sodium phosphate pH8, or 5% DMSO in sodium phosphate. All peptides were labelled with I¹²⁵ using the Iodogen reaction. Depending on solubility up to 1 mg of peptide was radiolabelled with approximately 137MBq of I¹²⁵ in a volume of 200 μl placed in an Iodogen tube (20 μg of Iodogen per tube) for varying times up to 60 minutes at room temperature. The Iodogen tubes were then washed with 1 ml of buffer and eluted through Sephadex G10 columns (5 ml bed volume) which had been previously equilibrated with the buffer. 1 ml fractions were collected from the eluate and assayed for radioactivity.

For example, 200 μl (1 ml) of peptide REPPTRTFAYWG (SEQ ID NO. 3) plus 10 μl (37MBq) I¹²⁵ were mixed in Iodogen tube for 60 minutes at room temperature. The fractions collected from subsequent elution through a G10 column were as follows:

                  TABLE 7                                                          ______________________________________                                         Fraction     Radioactivity (MBq)                                               ______________________________________                                         Background   0                                                                 1            0                                                                 2            2.03                                                              3            8.92                                                              4            6.33                                                              5            4.05                                                              6            3.14                                                              7            2.61                                                              8            2.18                                                              9            1.77                                                              10           1.42                                                              11           1.23                                                              12           0.94                                                              ______________________________________                                    

Total activity in samples 2-12 inclusive (equivalent to 1 mg of peptide)=34.62 MBq. Therefore, sample 3=(8.92/34.62)×1 mg=0.258 mg.ml⁻¹.

Binding of EPPT (SEQ ID NO:1) peptides to immobilized mucin receptor peptide.

The technique of equilibrium dialysis was used to determine the binding of EPPT peptides to the immobilized mucin receptor peptide. This technique utilizes two chambers separated by a dialysis membrane such that small molecules (less than 12,000-14,000 daltons) are capable of freely equilibrating between the two chambers. The receptor is confined to one half of the dialysis chamber. Once the peptide has bound to its receptor, it can no longer exert an osmotic effect in the system and there is a subsequent shift in the equilibrium such that the count rate is higher in the receptor chamber than in the opposing chamber. Samples are removed from both sides of the membrane and assayed for the presence of radioactivity. For the purposes of these studies, a 1 ml dialysis module was used separated into equal parts by a dialysis membrane. Immobilized receptor peptide was added to one chamber at a final concentration of 83.3 μg (with respect to peptide). Into both chambers were added varying concentrations of the EPPT peptides and the incubation left to equilibrate for 24-48 hours at room temperature with constant rotation of the dialysis module. In addition, some modules were set up with similar concentrations of EPPT (SEQ ID NO:1)peptide (labelled with I¹²⁵), but with a ten-fold excess of unlabelled ("cold") peptide for the purpose of determining non-specific binding (nsb). Following incubation and equilibration, samples were removed from the dialysis module and counted for the presence of I¹²⁵. The radioactivity was then converted to amount of peptide for the purposes of determining the amount bound to the immobilized receptor peptide.

FIGS. 2 and 3 relate to the binding of the EPPT(SEQ ID NO:1) peptide to the mucin receptor peptide, one showing the binding of concentrations tested for each peptide whilst the bar graph shows the amount of uptake at one specific concentration of peptide for easier comparison of the four peptides so far tested. Peptide EPPTRTFAY(SEQ ID NO. 2) shows no real binding to the mucin receptor peptide and we believe that this reflects the fact that the charged amino group must be removed from this glutamyl residue within the binding site.

The EPPTRTFAY(SEQ ID NO. 2) to comprises the whole of the V_(H). CDR-3 domain without any residues from the framework region and does not itself appear to bind the mucin receptor peptide. When three residues are added to this sequence from the adjoining framework region (ARG to N-terminus, TRP and GLY to C-terminus) binding is restored and improved by further addition of amino acid residues. We believe that the flanking residues do not have to be those from the framework region: provided the amino group of the N-terminus is distanced from the glutamic acid residue, then the molecule is capable of binding.

REFERENCES (all incorporated herein by reference)

1. Morrison, S. L. et al. Proc. Natl. Acad. Sci. USA 81, 6851-6855.

2. Ochi, A. et al. Nature (London) 302, 340-342.

3. Ochi, A. et al . Proc. Natl . Acad. Sci. USA 80, 6351-6355.

4. Boulianne, G. et al. J. Mol. Biol. Med. 4, 37-49.

5. Boulianne, G. L. et al. Nature (London) 312, 643-646.

6. Liu, F. R. and Gritzmacher, C. A. (1987) J. Immunol. 138, 324-329.

7. Sun, L. K. et al (1986) Hybridoma 5 (Suppl. 1), S17-S20.

8. Sahagan, B. G. et al (1986) J. Immunol. 137, 1066-1074.

9. Jones, P. T. et al (1985) Nature 32, 522-525.

10. Verhoeyen, M. et al (1988) Science 239, 1534-1536.

11. Riechmann, L. et al (1988) Nature 332,323-327.

18. O'Sullivan et al (1979) Anal. Biochem. 100, 100-108.

19. Bagshawe (1987) Br. J. Cancer 56, 531.

20. Bagshawe et al (1988) Br. J. Cancer 58, 700.

21. WO 88/07378.

22. Rowlinson-Busza et al (in press, "In Vitro cytotoxicity following specific activation of amygdalin by antibody-conjugated β-glucosidase").

23. Fraker, P. J. et al (1978 ) Biochem. Biophys. Res. Commun. 80, 49-57.

24. "Monoclonal Antibodies in Immunoscintigraphy", J-F. Chatal (CRC Press, 1989).

25. Rice, D. and Baltimore, D. (1982) Proc. Natl. Acad. Sci. USA 79, 7862-7865.

26. Neuberger, M. S. (1983) EMBO J. 2, 1373-1378.

27. Ochi, A. et al (1983) Proc. Natl. Acad. Sci. USA 80, 6351-6355.

28. Oi, V. T. et al (1983) Proc. Natl. Acad. Sci. USA 80, 825-829.

29. Banerji, J. et al (1983) Cell 33,729-740.

30. Gillies, S.D. et al (1983) Cell 33,717-728.

31. Neuberger, M. S. et al (1985) Nature 314, 268-270.

32. Neuberger, M. S. and Williams G. T. (1988) Nucl. Acids Rec. 16.

33. Cattaneo, A. and Neuberger, M. S. (1987) EMBO J. 6, 2753-2758.

34. Weidle, U. H. et al (1987) Gene 51, 21-29.

35. Whittle, N. et al (1987) Protein Engineering 1, 499-505.

36. Ritchie, K. A. et al (1984) Nature 312,517-521.

37. Neuberger, M. S., Caskey, H. M. , Petterssen, S. , Williams G. T. and Surani, M. A. (1988)

38. Skerra, A. and Pl uckthun, A. (1988) Science 240, 1038-1041.

39. Better, M. et al (1988) Science 240, 1041-1043.

40. Boulianne, G. L. et al (1984) Nature 312, 643-646.

41. Morrison, S. L. et al (1984) Proc. Natl. Acad. Sci. USA 81, 6851-6855.

42. Mulligan, R. C. and Berg, P. (1980) Science 209, 1422-1427.

43. Mulligan, R. C. and Berg, P. (1981) Proc. Nat. Acad. Sci. USA 78, 2072-2076.

44. Southern, P. J. and Berg, P. (1982) J. Molec. Appl. Genet. 1, 327-334.

45. Oi, V. T. and Morrison, S. L. (1986) BioTechniques 4, 214-221.

46. Deans, R. J. et al. Proc. Natl . Acad. Sci. USA 81, 1292-1296.

47. Morrison, S. L. et al (1987) Ann. N. Y. Acad. Sci. 507, 187-198.

48. Liu, A. Y. et al (1987) Proc. Natl. Acad. Sci. USA 84, 3439-3443.

49. Sharon, J. et al (1984) Nature (London) 309,364-367.

50. Tan, L. K. et al (1985) J. Immunol. 135, 3564-3567.

51. Morrison, S. L. (1985) Science 229, 1202-1207.

52. Taub, R. et al (1989) J. Biol. Chem. 264,259-265.

53. Williams, W. V. et al (1989) P.N.A.S. (USA) 86, 5537-5541.

54. Winter, G. & Milstein, C. (1991) Nature 349,293-299.

55. Sambrook, J. et al (1989) "Molecular Cloning: A Laboratory Manual" (2nd Edition) , Cold Spring Harbor, N.Y. , USA.

56. McCafferty, J. et al (1990) Nature 348, 552-554.

    __________________________________________________________________________     SEQUENCE LISTING                                                               (1) GENERAL INFORMATION:                                                       (iii) NUMBER OF SEQUENCES: 18                                                  (2) INFORMATION FOR SEQ ID NO:1:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 4 amino acids                                                      (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                                        GluProProThr                                                                   (2) INFORMATION FOR SEQ ID NO:2:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 9 amino acids                                                      (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                                        GluProProThrArgThrPheAlaTyr                                                    15                                                                             (2) INFORMATION FOR SEQ ID NO:3:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 12 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                                        ArgGluProProThrArgThrPheAlaTyrTrpGly                                           1510                                                                           (2) INFORMATION FOR SEQ ID NO:4:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 19 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                                        MetTyrTyrCysAlaArgGluProProThrArgThrPheAlaTyrTrp                               151015                                                                         GlyGlnGly                                                                      (2) INFORMATION FOR SEQ ID NO:5:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 41 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                                        CTCGAATTCTGAGGAGACGGTGACCGTGGTCCCTTGGCCCC41                                    (2) INFORMATION FOR SEQ ID NO:6:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 31 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: YES                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                                        ATCGGATCCAGGTSMARCTGCAGSAGTCWGG31                                              (2) INFORMATION FOR SEQ ID NO:7:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 369 base pairs                                                     (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                              (B) LOCATION: 22..369                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                                        AAGATCCAGGTCAAACTGCAGGAGTCTGGACCTGGCCTGGTGGCGCCCTCA51                          GluSerGlyProGlyLeuValAlaProSer                                                 1510                                                                           CAGAGCCTGTCCATCACATGCACCGTCTCAGGGTTCTCATTAACTAGC99                             GlnSerLeuSerIleThrCysThrValSerGlyPheSerLeuThrSer                               152025                                                                         TATGGTGTACACTGGGTTCGCCAGCCTCCAGGAAAGGGTCTGGAGTGG147                            TyrGlyValHisTrpValArgGlnProProGlyLysGlyLeuGluTrp                               303540                                                                         CTGGTAGTGATATGGAGTGATGGAAGCACAACCTATAATTCAGCTCTC195                            LeuValValIleTrpSerAspGlySerThrThrTyrAsnSerAlaLeu                               455055                                                                         AAATCCAGACTGAGCATCAGCAAGGACAACTCCAAGAGCCAAGTTTTC243                            LysSerArgLeuSerIleSerLysAspAsnSerLysSerGlnValPhe                               606570                                                                         TTAAAAATGAACAGTCTCCAAACTGATGACACAGCCATGTACTACTGT291                            LeuLysMetAsnSerLeuGlnThrAspAspThrAlaMetTyrTyrCys                               75808590                                                                       GCCAGAGAGCCTCCCACGACGTACGTTTGCTTACTGGGGCCAAGGGAC339                            AlaArgGluProProThrThrTyrValCysLeuLeuGlyProArgAsp                               95100105                                                                       ACGGTCACCGTCTCATCAGAATTCGTAATC369                                              ThrValThrValSerSerGluPheValIle                                                 110115                                                                         (2) INFORMATION FOR SEQ ID NO:8:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 116 amino acids                                                    (B) TYPE: amino acid                                                           (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: protein                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                                        GluSerGlyProGlyLeuValAlaProSerGlnSerLeuSerIleThr                               151015                                                                         CysThrValSerGlyPheSerLeuThrSerTyrGlyValHisTrpVal                               202530                                                                         ArgGlnProProGlyLysGlyLeuGluTrpLeuValValIleTrpSer                               354045                                                                         AspGlySerThrThrTyrAsnSerAlaLeuLysSerArgLeuSerIle                               505560                                                                         SerLysAspAsnSerLysSerGlnValPheLeuLysMetAsnSerLeu                               65707580                                                                       GlnThrAspAspThrAlaMetTyrTyrCysAlaArgGluProProThr                               859095                                                                         ThrTyrValCysLeuLeuGlyProArgAspThrValThrValSerSer                               100105110                                                                      GluPheValIle                                                                   115                                                                            (2) INFORMATION FOR SEQ ID NO:9:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 33 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:                                        CAGGCTGTTGTGACTCAGGAATTCGCACTCACC33                                            (2) INFORMATION FOR SEQ ID NO:10:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 30 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: YES                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:                                       ACCTAGTCTAGACAGTTTGGTTCCTCCACC30                                               (2) INFORMATION FOR SEQ ID NO:11:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 321 base pairs                                                     (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                              (B) LOCATION: 1..321                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:                                       CTGCAGGAGTCTGGACCTGGCCTGGTGGCGCCCTCACAGAGCCTGTCC48                             LeuGlnGluSerGlyProGlyLeuValAlaProSerGlnSerLeuSer                               151015                                                                         ATCACATGCACCGTCTCAGGGTTCTCATTAACTAGCTATGGTGTACAC96                             IleThrCysThrValSerGlyPheSerLeuThrSerTyrGlyValHis                               202530                                                                         TGGGTTCGCCAGCCTCCAGGAAAGGGTCTGGAGTGGCTGGTAGTGATA144                            TrpValArgGlnProProGlyLysGlyLeuGluTrpLeuValValIle                               354045                                                                         TGGAGTGATGGAAGCACAACCTATAATTCAGCTCTCAATTCCAGACTG192                            TrpSerAspGlySerThrThrTyrAsnSerAlaLeuAsnSerArgLeu                               505560                                                                         AGCATCAGCAAGGACAACTCCAAGAGCCAAGTTTTCTTAAAAATGAAC240                            SerIleSerLysAspAsnSerLysSerGlnValPheLeuLysMetAsn                               65707580                                                                       AGTCTCCAAACTGATGACACAGCCATGTACTACTGTGCCAGAGAGCCT288                            SerLeuGlnThrAspAspThrAlaMetTyrTyrCysAlaArgGluPro                               859095                                                                         CCCACACGTACGTTTGCCTACTGGGGCCAAGGG321                                           ProThrArgThrPheAlaTyrTrpGlyGlnGly                                              100105                                                                         (2) INFORMATION FOR SEQ ID NO:12:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 107 amino acids                                                    (B) TYPE: amino acid                                                           (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: protein                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:                                       LeuGlnGluSerGlyProGlyLeuValAlaProSerGlnSerLeuSer                               151015                                                                         IleThrCysThrValSerGlyPheSerLeuThrSerTyrGlyValHis                               202530                                                                         TrpValArgGlnProProGlyLysGlyLeuGluTrpLeuValValIle                               354045                                                                         TrpSerAspGlySerThrThrTyrAsnSerAlaLeuAsnSerArgLeu                               505560                                                                         SerIleSerLysAspAsnSerLysSerGlnValPheLeuLysMetAsn                               65707580                                                                       SerLeuGlnThrAspAspThrAlaMetTyrTyrCysAlaArgGluPro                               859095                                                                         ProThrArgThrPheAlaTyrTrpGlyGlnGly                                              100105                                                                         (2) INFORMATION FOR SEQ ID NO:13:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 330 base pairs                                                     (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                              (B) LOCATION: 1..330                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:                                       CAGGCTGTTCTGACTCAGGAATTCGCACTCACCACATCACCTGGTGAA48                             GlnAlaValLeuThrGlnGluPheAlaLeuThrThrSerProGlyGlu                               151015                                                                         ACAGTCACACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGT96                             ThrValThrLeuThrCysArgSerSerThrGlyAlaValThrThrSer                               202530                                                                         AACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTACTAACTGGT144                            AsnTyrAlaAsnTrpValGlnGluLysProAspHisLeuLeuThrGly                               354045                                                                         CTAATAGGTGGTACCAACAACCGAGCTCCAGGTGTTCCTGCCAGATTC192                            LeuIleGlyGlyThrAsnAsnArgAlaProGlyValProAlaArgPhe                               505560                                                                         TCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACTATCACAGGGGCA240                            SerGlySerLeuIleGlyAspLysAlaAlaLeuThrIleThrGlyAla                               65707580                                                                       CAGACTGAGGATGAGGCAACATATTTCTGTGCTCTATGGTACAGCAAC288                            GlnThrGluAspGluAlaThrTyrPheCysAlaLeuTrpTyrSerAsn                               859095                                                                         CACTGGGTGTTCGGTGGAGGAACCAAACTGTCTAGACTAGGT330                                  HisTrpValPheGlyGlyGlyThrLysLeuSerArgLeuGly                                     100105110                                                                      (2) INFORMATION FOR SEQ ID NO:14:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 110 amino acids                                                    (B) TYPE: amino acid                                                           (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: protein                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:                                       GlnAlaValLeuThrGlnGluPheAlaLeuThrThrSerProGlyGlu                               151015                                                                         ThrValThrLeuThrCysArgSerSerThrGlyAlaValThrThrSer                               202530                                                                         AsnTyrAlaAsnTrpValGlnGluLysProAspHisLeuLeuThrGly                               354045                                                                         LeuIleGlyGlyThrAsnAsnArgAlaProGlyValProAlaArgPhe                               505560                                                                         SerGlySerLeuIleGlyAspLysAlaAlaLeuThrIleThrGlyAla                               65707580                                                                       GlnThrGluAspGluAlaThrTyrPheCysAlaLeuTrpTyrSerAsn                               859095                                                                         HisTrpValPheGlyGlyGlyThrLysLeuSerArgLeuGly                                     100105110                                                                      (2) INFORMATION FOR SEQ ID NO:15:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 11 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:                                       SerLeuThrSerTyrGlyValHisTrpValArg                                              1510                                                                           (2) INFORMATION FOR SEQ ID NO:16:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 17 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:                                       TyrCysAlaArgGluProProThrArgThrPheAlaTyrTrpGlyGln                               151015                                                                         Gly                                                                            (2) INFORMATION FOR SEQ ID NO:17:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 22 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:                                       TrpLeuValValIleTrpSerAspGlySerThrThrTyrAsnSerAla                               151015                                                                         LeuAsnSerArgCysMet                                                             20                                                                             (2) INFORMATION FOR SEQ ID NO:18:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 15 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: internal                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:                                       TyrValThrSerAlaProAspThrArgProAlaProGlySerThr                                  151015                                                                         __________________________________________________________________________ 

I claim:
 1. A method for identifying or locating a mucin or a cell bearing said mucin wherein said mucin is capable of being selectively bound by a molecule having an exposed amino acid sequence EPPT (SEQ ID NO. 1) on the surface thereof, comprising exposing the cell or mucin to a molecule capable of binding mucin consisting of the amino acid sequence EPPT (SEQ ID NO:1) and further amino acids to form a peptide of up to 30 amino acids wherein said molecule is detectably labeled, and detecting binding of said molecule to said cell or mucin.
 2. A molecule capable of binding mucin which consists of the amino acid sequence of EPPT(SEQ ID NO:1).
 3. A molecule capable of binding mucin consisting of the amino acid sequence EPPT (SEQ ID NO:1) and further amino acids to form a peptide of up to 30 amino acids.
 4. A polynucleotide encoding the molecule of claim
 2. 